1 Synchronization and Optimality for Multi - Armed Bandit Problems in Continuous Time

نویسندگان

NICOLE EL KAROUI

IOANNIS KARATZAS

چکیده

We provide a complete solution to a general, continuous-time dynamic allocation (multi-armed bandit) problem with arms that are not necessarily independent or Markovian, using notions and results from time-changes, optimal stopping, and multi-parameter martingale theory. The independence assumption is replaced by the condition (F.4) of Cairoli & Walsh. We also introduce a synchronization identity for allocation strategies, ∗ Rsearch supported by the U.S. Army Research Office under Grant DAAH 04-95-I0528. We are grateful to Prof. Daniel Ocone for finding, and correcting, an error in our original proof of Theorem 7.1. 2 which is necessary and sufficient for optimality in the case of decreasing rewards, and which leads to the explicit construction of a strategy with all the important properties: optimality in the dynamic allocation problem, optimality in a dual (minimization) problem, and the “index-type” property of Gittins.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to any non-trivial factor. Thus the optimality is very difficult to obtain due to its high complexity. A natural method is to obtain the greedy policy considerin...

متن کامل

On the optimality of the Gittins index rule for multi-armed bandits with multiple plays

We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su1⁄2cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie ̄y the Gittins index rule. We show...

متن کامل

On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection

We consider the channel access problem in a multi-channel opportunistic communication system with imperfect channel sensing, where the state of each channel evolves as a non independent and identically distributed Markov process. This problem can be cast into a restless multi-armed bandit (RMAB) problem that is intractable for its exponential computation complexity. A natural alternative is to ...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to pla...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

1 Synchronization and Optimality for Multi - Armed Bandit Problems in Continuous Time

نویسندگان

چکیده

منابع مشابه

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

On the optimality of the Gittins index rule for multi-armed bandits with multiple plays

On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection

Budgeted Bandit Problems with Continuous Random Costs

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

عنوان ژورنال:

اشتراک گذاری